450 research outputs found

    Comparative Analysis of Five XML Query Languages

    Full text link
    XML is becoming the most relevant new standard for data representation and exchange on the WWW. Novel languages for extracting and restructuring the XML content have been proposed, some in the tradition of database query languages (i.e. SQL, OQL), others more closely inspired by XML. No standard for XML query language has yet been decided, but the discussion is ongoing within the World Wide Web Consortium and within many academic institutions and Internet-related major companies. We present a comparison of five, representative query languages for XML, highlighting their common features and differences.Comment: TeX v3.1415, 17 pages, 6 figures, to be published in ACM Sigmod Record, March 200

    False News On Social Media: A Data-Driven Survey

    Full text link
    In the past few years, the research community has dedicated growing interest to the issue of false news circulating on social networks. The widespread attention on detecting and characterizing false news has been motivated by considerable backlashes of this threat against the real world. As a matter of fact, social media platforms exhibit peculiar characteristics, with respect to traditional news outlets, which have been particularly favorable to the proliferation of deceptive information. They also present unique challenges for all kind of potential interventions on the subject. As this issue becomes of global concern, it is also gaining more attention in academia. The aim of this survey is to offer a comprehensive study on the recent advances in terms of detection, characterization and mitigation of false news that propagate on social media, as well as the challenges and the open questions that await future research on the field. We use a data-driven approach, focusing on a classification of the features that are used in each study to characterize false information and on the datasets used for instructing classification methods. At the end of the survey, we highlight emerging approaches that look most promising for addressing false news

    Topology comparison of Twitter diffusion networks effectively reveals misleading information

    Full text link
    In recent years, malicious information had an explosive growth in social media, with serious social and political backlashes. Recent important studies, featuring large-scale analyses, have produced deeper knowledge about this phenomenon, showing that misleading information spreads faster, deeper and more broadly than factual information on social media, where echo chambers, algorithmic and human biases play an important role in diffusion networks. Following these directions, we explore the possibility of classifying news articles circulating on social media based exclusively on a topological analysis of their diffusion networks. To this aim we collected a large dataset of diffusion networks on Twitter pertaining to news articles published on two distinct classes of sources, namely outlets that convey mainstream, reliable and objective information and those that fabricate and disseminate various kinds of misleading articles, including false news intended to harm, satire intended to make people laugh, click-bait news that may be entirely factual or rumors that are unproven. We carried out an extensive comparison of these networks using several alignment-free approaches including basic network properties, centrality measures distributions, and network distances. We accordingly evaluated to what extent these techniques allow to discriminate between the networks associated to the aforementioned news domains. Our results highlight that the communities of users spreading mainstream news, compared to those sharing misleading news, tend to shape diffusion networks with subtle yet systematic differences which might be effectively employed to identify misleading and harmful information.Comment: A revised new version is available on Scientific Report

    Search Computing

    Get PDF

    Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation

    Get PDF
    Background: Since the beginning of the COVID-19 pandemic, >1 million studies have been collected within the COVID-19 Open Research Dataset, a corpus of manuscripts created to accelerate research against the disease. Their related abstracts hold a wealth of information that remains largely unexplored and difficult to search due to its unstructured nature. Keyword-based search is the standard approach, which allows users to retrieve the documents of a corpus that contain (all or some of) the words in a target list. This type of search, however, does not provide visual support to the task and is not suited to expressing complex queries or compensating for missing specifications. Objective: This study aims to consider small graphs of concepts and exploit them for expressing graph searches over existing COVID-19-related literature, leveraging the increasing use of graphs to represent and query scientific knowledge and providing a user-friendly search and exploration experience. Methods: We considered the COVID-19 Open Research Dataset corpus and summarized its content by annotating the publications' abstracts using terms selected from the Unified Medical Language System and the Ontology of Coronavirus Infectious Disease. Then, we built a co-occurrence network that includes all relevant concepts mentioned in the corpus, establishing connections when their mutual information is relevant. A sophisticated graph query engine was built to allow the identification of the best matches of graph queries on the network. It also supports partial matches and suggests potential query completions using shortest paths. Results: We built a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships; the GRAPH-SEARCH interface allows users to explore the network by formulating or adapting graph queries; it produces a bibliography of publications, which are globally ranked; and each publication is further associated with the specific parts of the query that it explains, thereby allowing the user to understand each aspect of the matching. Conclusions: Our approach supports the process of query formulation and evidence search upon a large text corpus; it can be reapplied to any scientific domain where documents corpora and curated ontologies are made available

    Exploring the evolution of research topics during the COVID-19 pandemic

    Get PDF
    The COVID-19 pandemic has changed the research agendas of most scientific communities, resulting in an overwhelming production of research articles in a variety of domains, including medicine, virology, epidemiology, economy, psychology, and so on. Several open-access corpora and literature hubs were established; among them, the COVID-19 Open Research Dataset (CORD-19) has systematically gathered scientific contributions for 2.5 years, by collecting and indexing over one million articles—this corpus, however, does not provide an easy-to-access overview of its content. Here, we present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts. Our method is based upon a careful selection of up-to-date technologies (including large language models), resulting in an architecture for clustering articles along orthogonal dimensions and extraction techniques for temporal topic mining. Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series, equipped with easy-to-drive statistical testing for analyzing the significance of topic emergence along arbitrarily selected time windows. Overall, our pipeline is very fast and its results match our expectations on topic identification (F1-score 0.854). The processes of data preparation and results visualization are completely general and virtually applicable to any corpus of textual documents—thus suited for effective adaptation to other contexts

    Exploring the evolution of research topics during the COVID-19 pandemic

    Full text link
    The COVID-19 pandemic has changed the research agendas of most scientific communities, resulting in an overwhelming production of research articles in a variety of domains, including medicine, virology, epidemiology, economy, psychology, and so on. Several open-access corpora and literature hubs were established; among them, the COVID-19 Open Research Dataset (CORD-19) has systematically gathered scientific contributions for 2.5 years, by collecting and indexing over one million articles. Here, we present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts. Our method is based upon a careful selection of up-to-date technologies (including large language models), resulting in an architecture for clustering articles along orthogonal dimensions and extraction techniques for temporal topic mining. Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series, equipped with easy-to-drive statistical testing for analyzing the significance of topic emergence along arbitrarily selected time windows. The processes of data preparation and results visualization are completely general and virtually applicable to any corpus of textual documents - thus suited for effective adaptation to other contexts.Comment: 16 pages, 6 figures, 1 tabl
    • …
    corecore